41 research outputs found

    A Variational Perspective on Accelerated Methods in Optimization

    Full text link
    Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. While many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the \emph{Bregman Lagrangian} which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods correspond to traveling the same curve in spacetime at different speeds. From this perspective, Nesterov's technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.Comment: 38 pages. Subsumes an earlier working draft arXiv:1509.0361

    Sufficient Conditions for Uniform Stability of Regularization Algorithms

    Get PDF
    In this paper, we study the stability and generalization properties of penalized empirical-risk minimization algorithms. We propose a set of properties of the penalty term that is sufficient to ensure uniform ?-stability: we show that if the penalty function satisfies a suitable convexity property, then the induced regularization algorithm is uniformly ?-stable. In particular, our results imply that regularization algorithms with penalty functions which are strongly convex on bounded domains are ?-stable. In view of the results in [3], uniform stability implies generalization, and moreover, consistency results can be easily obtained. We apply our results to show that â p regularization for 1 < p <= 2 and elastic-net regularization are uniformly ?-stable, and therefore generalize

    Convergence in KL Divergence of the Inexact Langevin Algorithm with Application to Score-based Generative Models

    Full text link
    We study the Inexact Langevin Algorithm (ILA) for sampling using estimated score function when the target distribution satisfies log-Sobolev inequality (LSI), motivated by Score-based Generative Modeling (SGM). We prove a long-term convergence in Kullback-Leibler (KL) divergence under a sufficient assumption that the error of the score estimator has a bounded Moment Generating Function (MGF). Our assumption is weaker than L∞L^\infty (which is too strong to hold in practice) and stronger than L2L^2 error assumption, which we show not sufficient to guarantee convergence in general. Under the L∞L^\infty error assumption, we additionally prove convergence in R\'enyi divergence, which is stronger than KL divergence. We then study how to get a provably accurate score estimator which satisfies bounded MGF assumption for LSI target distributions, by using an estimator based on kernel density estimation. Together with the convergence results, we yield the first end-to-end convergence guarantee for ILA in the population level. Last, we generalize our convergence analysis to SGM and derive a complexity guarantee in KL divergence for data satisfying LSI under MGF-accurate score estimator.Comment: 36 page
    corecore